Model comparisons

Hypothesis testing and ROPEs

Elizabeth King
Kevin Middleton

This week

  1. Hypothesis testing and ROPEs
  2. Information and WAIC
  3. Importance Sampling and PSIS-LOOCV
  4. Model averaging

Making Decisions in Data Analysis

  • Parameter estimation
    • means, medians, intervals
  • Hypothesis testing
    • P-values
    • AIC, AICc
    • Credible intervals, HDIs
    • ROPEs
    • LOO-CV values

Bayesian and frequentist approaches

Frequentist / Maximum likelihood:

  • Parameters fixed, data varies
  • \(Pr[Data | \theta]\)

Bayesian:

  • Data fixed, parameters vary
  • \(Pr[\theta | Data]\)

\[Pr[\theta | Data] = \frac{Pr[Data | \theta] \times Pr[\theta]}{Pr[Data]}\]

Energy expenditure in naked mole rats

Energy expenditure in naked mole rats

ANCOVA: intercepts varying

fm <- lm(Energy ~ Mass + Caste, data = M)

Bayesian model

  • Return separate intercepts with - 1
  • Note a lot of samples are necessary for stable Bayes factors (Schad et al. 2022)
    • 80,000 total post warm-up here
fm <- brm(Energy ~ Mass + Caste - 1, data = M,
          prior = prior(normal(0, 3), class = b),
          iter = 4e4,
          cores = 4,
          seed = 45873495)

Parameter estimates of interest

  1. Shared slope for Mass
  2. “Intercept” difference of Worker vs. Non-worker castes
    • Overall intercepts for Caste probably not interesting in this example
    • What does Mass = 0 mean for a naked mole rate?

HDI-based tests

  • Against 0
  • Against an a priori value
    • e.g., Isometric slope
  • Choose an interval (preferably not 95%)

HDI for slope

summary(fm, prob = 0.89)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Energy ~ Mass + Caste - 1 
   Data: M (Number of observations: 35) 
  Draws: 4 chains, each with iter = 40000; warmup = 20000; thin = 1;
         total post-warmup draws = 80000

Population-Level Effects: 
               Estimate Est.Error l-89% CI u-89% CI Rhat Bulk_ESS Tail_ESS
Mass               0.89      0.18     0.60     1.18 1.00    15525    19642
CasteNonWorker    -0.09      0.89    -1.51     1.33 1.00    15527    19894
CasteWorker        0.30      0.79    -0.96     1.56 1.00    15509    19288

Family Specific Parameters: 
      Estimate Est.Error l-89% CI u-89% CI Rhat Bulk_ESS Tail_ESS
sigma     0.31      0.04     0.25     0.38 1.00    24236    26368

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

HDI for differences

library(tidybayes)

post <- fm |> 
  spread_rvars(b_Mass, b_CasteWorker, b_CasteNonWorker) |> 
  mutate(d_W_NW = b_CasteWorker - b_CasteNonWorker, .keep = "unused")
head(post)
# A tibble: 1 × 2
        b_Mass       d_W_NW
    <rvar[1d]>   <rvar[1d]>
1  0.89 ± 0.18  0.39 ± 0.15
post |> 
  median_hdi(b_Mass, d_W_NW, .width = 0.89)
# A tibble: 1 × 9
  b_Mass b_Mass.lower b_Mass.upper d_W_NW d_W_NW…¹ d_W_N…² .width .point .inte…³
   <dbl>        <dbl>        <dbl>  <dbl>    <dbl>   <dbl>  <dbl> <chr>  <chr>  
1  0.890        0.604         1.18  0.391    0.161   0.626   0.89 median hdi    
# … with abbreviated variable names ¹​d_W_NW.lower, ²​d_W_NW.upper, ³​.interval

HDI Plot

Many interesting visualizations in the see package

library(bayestestR)
library(see)

HD_interval <-  bayestestR::hdi(fm, ci = c(0.89))
plot(HD_interval)

HDI Plot

Hypothesis tests with hypothesis()

Similar to generalized linear hypothesis tests (glht() in multcomp)

hypothesis(fm, "CasteWorker - CasteNonWorker = 0", alpha = 0.89)
Hypothesis Tests for class b:
                Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (CasteWorker-Cast... = 0     0.39      0.15     0.37     0.41         NA
  Post.Prob Star
1        NA    *
---
'CI': -78%-CI for one-sided and 11%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 11%;
for two-sided hypotheses, the value tested against lies outside the 11%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

Estimate difference while sampling?

  • gq> statement in ulam() model
  • Pure stan code, and use rstan or cmdstanr
  • generated quantities block lets you access the samples at each iteration
    • Calculations, predictions, etc., which all go into the output
make_stancode(Energy ~ Mass + Caste - 1,
              data = M,
              prior = prior(normal(0, 3), class = b))

Estimate difference while sampling?

// generated with brms 2.18.0
functions {
  
}
data {
  int<lower=1> N; // total number of observations
  vector[N] Y; // response variable
  int<lower=1> K; // number of population-level effects
  matrix[N, K] X; // population-level design matrix
  int prior_only; // should the likelihood be ignored?
}
transformed data {
  
}
parameters {
  vector[K] b; // population-level effects
  real<lower=0> sigma; // dispersion parameter
}
transformed parameters {
  real lprior = 0; // prior contributions to the log posterior
  lprior += normal_lpdf(b | 0, 3);
  lprior += student_t_lpdf(sigma | 3, 0, 2.5)
            - 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
  // likelihood including constants
  if (!prior_only) {
    target += normal_id_glm_lpdf(Y | X, 0, b, sigma);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  
}

Bayes factors

  • Closest Bayesian analogue for a P-value
  • Savage-Dickey ratio: Divide the density (height) of the posterior for \(\theta\) by the density of the prior for \(\theta\) at the point of interest (null = 0).
  • Popular in psychology

Bayes factors

Can also be used for:

  • Directional tests (one-sided tests)
  • Intervals (ROPE-like)
  • Models vs. models

Bayes factors

“While Bayes factors provide an immediate approach to hypothesis testing, they are highly sensitive to details of the data/model assumptions.” (Schad et al. 2022)

(BF <- bayesfactor_parameters(fm, null = 0))
Bayes Factor (Savage-Dickey density ratio)

Parameter      |       BF
-------------------------
Mass           | 3.37e+03
CasteNonWorker |    0.293
CasteWorker    |    0.286

* Evidence Against The Null: 0

Bayes factors

plot(BF)

Region of practical equivalence

“A difference of zero plus or minus some small amount is for practical purposes ‘not different’ from zero.”

Easystats has a good introduction.

Additional reading:

rope()

  • Use rope_range() to automatically find the most suitable range for comparison.
    • Think about what interval you consider practically equivalent
  • Recommended to use the entire posterior to determine the percentage in the ROPE (ci = 1)
fm |> 
  spread_rvars(b_Mass, b_CasteWorker, b_CasteNonWorker) |> 
  rename(Mass = b_Mass) |> 
  mutate(d_W_NW = b_CasteWorker - b_CasteNonWorker, .keep = "unused") |> 
  rope(ci = 1, range = c(-0.25, 0.25))

rope()

# Proportion of samples inside the ROPE [-0.25, 0.25]:

Parameter | inside ROPE
-----------------------
Mass      |      0.03 %
d_W_NW    |     16.16 %

ROPE plot

rope(fm, ci = 1, range = c(-0.25, 0.25)) |> 
  plot()

ROPE plot

References

Kruschke, J. K. 2015. Doing Bayesian Data Analysis: a Tutorial with R, JAGS, and Stan. 2nd ed. Academic Press, Boston, MA.
Kruschke, J. K. 2018. Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science 1:270–280. SAGE Publications Inc.
Kruschke, J. K. 2010. What to Believe: Bayesian Methods for Data Analysis. Trends Cogn. Sci. 14:293–300.
Kruschke, J. K., H. Aguinis, and H. Joo. 2012. The Time Has Come: Bayesian Methods for Data Analysis in the Organizational Sciences. Organizational Research Methods 15:722–752. SAGE Publications Inc.
Schad, D. J., B. Nicenboim, P.-C. Bürkner, M. Betancourt, and S. Vasishth. 2022. Workflow Techniques for the Robust Use of Bayes Factors. Psychol. Methods, doi: 10.1037/met0000472.